The Feed Must Get Through!

Shlomi Fish on 2006-11-17T20:05:25

After I recently upgraded my local copy of XML::RSS, I discovered that my aggregated feed that is generated using XML::Feed from the feeds of all my blogs can no longer be proccessed correctly by Akregator. And when trying to validate it, I encountered some problems. This meant that we introduced some regressions into XML::RSS that had to be fixed.

The first problem I encountered was that I got empty <pubDate></pubDate> code. Looking at the XML::RSS code, I saw that the appropriate fields were still initialised to an empty string instead of undef, which caused them to be outputted. And the code in general was in an intermediate state than my changes. After merging my "datetime" local branch, I also had to fix some markup injection attacks that I found, since I didn't escape some of the tags' contents. Here's the issue with my patch for the whole enchilada.

The next errors had to do with "guid". In XML::RSS "permaLink" holds the guid URL if isPermaLink is true, and "guid" holds it if it is false. However, permaLink was equal to 1. As it turned out the parsing logic was out-dated, and had to be fixed. The fix along with testcases is in my local repository.

Next I found out that some of the items were missing the date time stamp. I noticed that it happened with an RSS 1.0 feed, and as it turned out the <dc:date> items were not handled correctly. A close inspection revelead that XML::Feed initialised XML::RSS with version => "2.0" and so the modules as a result were not defined during the parsing, due to changes in the modules initialisation for XML::RSS. So I added a workaround that when parsing the extra modules will again be defined (with a test). I can't see why version would be useful for anything except output.

And afterwards, the feed validated, and Akregator could read it. I had a lot of other plans for today, which had to be delays because of this work on XML::RSS. But a hacker got to do, what a hacker got to do.


API design

Aristotle on 2006-11-18T12:30:52

In XML::RSS "permaLink" holds the guid URL if isPermaLink is true, and "guid" holds it if it is false.

Sounds like bad API design to me. permaLink should always be the permalink, if there is one, and guid should always be the GUID, if there is one. How this is specified in the wire format is something the API should not expose.

I don’t know if you’re at liberty to make such changes, though.

(In fact, I boggle at the effort you’re putting into a module for RSS… Atom’s where it’s at, nowadays. Though I guess we need solid code for consuming the RSS that’s sadly already around.)

Re:API design

Shlomi Fish on 2006-11-18T16:04:15

Sounds like bad API design to me. permaLink should always be the permalink, if there is one, and guid should always be the GUID, if there is one. How this is specified in the wire format is something the API should not expose.

It is bad API design in my opinion. But this was the API since XML::RSS 1.05. BTW, in RSS 2.0 what happens is that the guid element has an isPermaLink attribute which can be "true" or "false". If it is true, then permaLink will hold the contents of the "guid" tag, and if it's false then "guid" will.

I don’t know if you’re at liberty to make such changes, though.

Not at the moment, no.

In fact, I boggle at the effort you’re putting into a module for RSS… Atom’s where it’s at, nowadays. Though I guess we need solid code for consuming the RSS that’s sadly already around.)

Indeed. I didn't take a close look at Atom yet. But I'm making use of XML::RSS personally and so do many other people who generate or parse the various versions of RSS. My work on XML::RSS started when I saw that a feed I generated did not validate, and decided to fix it. Since then it's been quite an obsession for me to work on it in my copious time.

Re:API design

Aristotle on 2006-11-18T17:59:14

Atom’s just RSS without the bugs.

Re:API design

ask on 2006-11-19T08:01:51

> Sounds like bad API design to me. [....]

Indeed. XML::RSS is mostly a big messy patchwork.

> I don’t know if you’re at liberty to make such changes, though.

The current focus is to slowly get the test coverage up and bugs fixed; when we have good coverage we can refactor the code and the API (while staying compatible with the old one). As you point out, there really isn't much need for innovations in an RSS module.

I got sucked into looking after the module after finding a bug (like Shlomi) and back then finding nobody there to take my patch. :-)

  - ask